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ABSTRACT 


Background: 
Breast cancer remains a primary global health concern, emphasizing the critical need for accurate 
diagnostic tools. This study focuses on developing a precise method for classifying breast cancer images 
using a specifically designed Convolutional Neural Network (CNN). The research employs the BreakHis 
dataset for training and evaluation, comprising high-resolution histopathological images of breast biopsy 
specimens stained with hematoxylin and eosin. 

Methods Used: 
The unique CNN architecture incorporates convolutional layers, max-pooling layers, dropout layers, and 
batch normalization, tailored to capture intricate patterns distinguishing between benign and cancerous 
breast tissues. Comprehensive data preprocessing is implemented, involving label extraction from 
filenames and augmentation techniques to enhance the training set. The training of the CNN model 
involves using the Adam optimizer, binary cross-entropy loss, and evaluation metrics such as binary 
accuracy and ROC-AUC. Early halting and learning rate decrease callbacks are integrated into the 
training process to optimize model performance. 

Results Achieved: 
The trained CNN model is assessed on a separate test dataset, and performance metrics, including ROC- 
AUC, accuracy, and a confusion matrix, are provided. The findings demonstrate that the custom CNN 
reliably categorizes breast cancer images, suggesting its potential as a valuable tool for automated breast 
cancer diagnosis. Notably, the study reports a high ROC-AUC value (0.98051) and satisfactory accuracy 
(0.93285), indicating the effectiveness of the custom CNN for breast cancer histopathology image 
categorization. 

Concluding Remarks: 
This work underscores the significance of tailored CNN architectures in enhancing the precision of breast 
cancer diagnostics, contributing to the ongoing efforts to leverage machine learning in histopathological 
image processing. The promising outcomes of the proposed approach set the stage for further 
advancements in computer-aided diagnostics and medical image analysis. The reported high ROC-AUC 
value and accuracy affirm the efficiency of the custom CNN, supporting its potential application in real- 
world breast cancer diagnostic scenarios. 


Keywords: Breast Cancer, Global Health, Convolutional Neural Network (CNN), Diagnostic Instruments, 


BreakHis dataset. 


1. INTRODUCTION 


Breast cancer is a significant global health concern, 
demanding modem and _ precise diagnostic 
technologies for successful identification and 
classification. The combination of machine learning 
with medical imaging has shown promise in recent 
years for improving the precision and effectiveness 
of breast cancer diagnosis. Convolutional Neural 
Networks (CNNs) are one of the most effective 


methods for image analysis tasks. A_ typical 
methodology involving the CNN for breast cancer 
prediction is shown in Fig 1. This work aims to 
improve diagnostic processes by developing and 
utilizing a customized CNN for categorizing photos 
related to breast cancer. It is impossible to 
overestimate how important an accurate breast 
cancer diagnosis is since early identification is 
essential for prompt action and better patient 
outcomes. Conventional diagnostic techniques 
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frequently depend on the subjective and time- 


consuming histological analysis of biopsy specimens. 


The potential to transform diagnostic precision lies 
in integrating machine learning techniques and 
exceptionally specialized CNNs designed to handle 
the intricacies of image data related to breast cancer. 
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Fig 1. Different Types of Brain Tumours 
Source: [9] 


This research leverages the BreakHis dataset, 
comprised of high-resolution histopathology pictures 
of breast biopsy tissues stained with hematoxylin and 
eosin. The selection of this dataset offers a firm basis 
on which to train and assess the suggested custom 
CNN architecture. Convolutional layers, max- 
pooling layers, dropout layers, and _ batch 
normalization are all part of the complex CNN 
design precisely engineered to catch subtle patterns 
that suggest benign and malignant breast tissues [2]. 
Strict data preprocessing approaches, such as label 
extraction and augmentation techniques, guarantee 
the Model's effectiveness. The study uses the binary 
cross entropy loss function, the Adam optimizer, and 
performance measurements like binary accuracy and 
ROC-AUC during the training phase. Techniques 
like early pausing and learning rate reduction are also 
included to enhance CNN's performance. An 
analysis of a confusion matrix, accuracy, and 
performance metrics like ROC-AUC value is used to 
assess the trained Model's efficacy on a different test 
dataset. The promising results of this work highlight 
how custom CNN architectures can be used to 
improve and automate picture categorization for 
breast cancer histopathology. This study advances 
the field of medical image analysis while 
highlighting the broader implications of using 
cutting-edge machine learning methods in computer- 
aided diagnosis. The results demonstrate that the 
suggested custom CNN is an effective method for 
classifying breast cancer, as evidenced by the high 
ROC-AUC value and good accuracy. This suggests 


that further developments in automated diagnostic 
approaches are possible. 


A. Purpose of the study 

The primary goal of this research is to tackle the 
pressing worldwide health issue of breast cancer by 
the introduction and application of sophisticated 
diagnostic technologies that capitalize on the 
combination of medical imaging and machine 
learning. Convolutional neural networks, or CNNs, 
are widely acknowledged as practical image- 
processing tools. This research aims to improve the 
efficacy and precision of breast cancer diagnosis by 
creating and utilizing a customized CNN architecture. 


The study highlights the shortcomings of traditional 
diagnostic techniques, which frequently rely on 
subjective and time-consuming histological 
interpretation of biopsy materials, in light of the 
critical relevance of early breast cancer identification 
for better patient outcomes. Incorporating machine 
learning methods, namely specialized CNNs built to 
handle the complex complexities of breast cancer 
image data, can provide revolutionary improvements 
in diagnostic precision. The suggested custom CNN 
architecture is trained and assessed on the BreakHis 
dataset, which comprises high-resolution 
histopathology images of breast biopsy tissues 
stained with hematoxylin. The CNN's complex 
architecture, which includes batch normalization, 
max-pooling layers, dropout layers, and 
convolutional layers, is designed to pick up on 
minute patterns that indicate breast cancer and 
benign tissue. 

The work uses strict data preparation methods, such 
as label extraction and augmentation procedures, to 
guarantee the Model's efficacy. Essential elements, 
including the Adam optimizer, the binary cross 
entropy loss function, and performance measures 
like binary accuracy and ROC-AUC, are used in the 
training phase. Techniques, including early pause 
and learning rate decrease, are combined to improve 
CNN's performance. The trained Model is 
thoroughly assessed using a different test dataset. 
This analysis includes a confusion matrix, an 
accuracy evaluation, and a look at performance 
measures such as the ROC-AUC value. In addition 
to advancing the field of medical image analysis, the 
study's encouraging findings highlight the broader 
implications of utilizing state-of-the-art machine 
learning techniques in automated diagnosis. The high 
ROC-AUC value and good accuracy support the 
study's conclusion, which indicates that the proposed 
custom CNN architecture is a valid and dependable 
approach to breast cancer classification. These 
results highlight the revolutionary effect of custom 
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CNN architectures on breast cancer histopathology 
picture categorization and indicate the possibility for 
future advancements in automated diagnostic 
techniques. 


B. Aim 
The main aim of this project is to classify the breast 


cancer images, either Benign or malignant, using 
Custom CNN. 


Cc. Objectives 

e Build a Convolutional Neural Network (CNN) 
specifically designed to accurately classify 
photos of breast cancer into benign and 
malignant categories. 

e Apply robust data preparation approaches, such 
as label extraction and augmentation techniques, 
to improve the Model's ability to handle a 
variety of histopathological images. 

e To train and assess the custom CNN architecture, 
use the BreakHis dataset, which consists of 
high-resolution breast biopsy tissues stained 
with hematoxylin. 

e Utilise essential CNN elements such as batch 
normalization, max-pooling, dropout, and 
convolutional layers to capture complex patterns 
representing benign and cancerous breast tissues. 

e For the CNN to function best during the training 
phase, use binary cross-entropy loss, the Adam 
optimizer, and performance measurements like 
binary accuracy and ROC-AUC. 


D. Research Questions 


1. What architectural strategies may ensure that 
breast cancer images are accurately classified 
into benign and malignant categories using 
Convolutional Neural Networks (CNNs)? 

2. How may robust label extraction and 
augmentation methods for data preparation 
improve the capacity of the custom CNN to 
process various histological pictures related to 
breast cancer? 

3. How can the training and assessment of the 
suggested custom CNN architecture benefit 
from using the BreakHis dataset, which consists 
of high-resolution breast biopsy tissues stained 
with hematoxylin and eosin? 

4. What role do crucial CNN elements like 
convolutional layers, max-pooling _ layers, 
dropout layers, and batch normalization play in 
collecting intricate patterns representing benign 
and cancerous breast tissues? 

5. How do performance measurements like binary 
accuracy and ROC-AUC, binary cross-entropy 


loss, and the Adam optimizer affect the CNN's 
optimal performance during the breast cancer 
picture classification training stage? 


In terms of global health, the diagnosis of breast 
cancer is a matter of great importance and urgency. 
One of the most common cancers in the world and 
the main reason why women die from cancer is 
breast cancer. Since early detection greatly improves 
treatment outcomes and patient survival rates, there 
is an urgent need for accurate and timely diagnosis. 
Unfortunately, subjective and time-consuming 
histological analysis of biopsy specimens is a 
common component of conventional diagnostic 
techniques, which causes delays in diagnosis and 
treatment initiation. Moreover, the intricacy of breast 
cancer pathology presents difficulties for 
conventional diagnostic techniques since it 
necessitates expertise and is subject to interpretation 
variability when differentiating between benign and 
malignant tissues. Due to its intrinsic subjectivity, 
misclassifications may arise, which could lead to 
poor treatment choices and unfavourable patient 
outcomes. 

Within this framework, combining machine learning 
with medical imaging shows promise as a way to 
improve the accuracy and efficiency of diagnosis. 
Given their impressive performance in image 
analysis tasks, convolutional neural networks (CNNs) 
may be able to address some of the difficulties 
associated with breast cancer diagnosis. It is possible 
to automate and optimise the categorization of breast 
cancer images by utilising sophisticated 
computational techniques, such as customised CNN 
architectures trained on massive datasets. 

The introduction's research discusses the urgent need 
for cutting-edge diagnostic technologies that can get 
beyond the drawbacks of conventional approaches. 
The project intends to transform diagnostic 
procedures and enhance patient outcomes by creating 
and implementing a customised CNN designed 
especially for breast cancer image classification. The 
significance of this issue stems from its direct impact 
on people's health and well-being, underscoring the 
need to advance diagnostic methodologies in order to 
effectively combat breast cancer. 


2. LITERATURE REVIEW 


A. Recent Studies 

Breast cancer is a common type of cancer that starts 
in the breast cells and has the potential to be fatal. 
While it can also happen to men, women are far more 
likely to experience it. Breast cancer is characterized 
by an uncontrolled growth of abnormal cells in the 
breast tissue. If treatment is not received, the cancer 
may spread to neighbouring tissues and appear as a 
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lump or mass. Improving treatment results largely 
depends on early diagnosis through screening, such 
as mammography, and developments in diagnostic 
methods, such as histopathological image analysis. 
In this direction, [3] discusses the difficulties in 
histopathological cancer detection, highlighting the 
subjective and time-consuming nature of the manual 
examination. They suggest a hybrid convolutional 
and recurrent deep neural network to improve and 
automate breast cancer histopathology image 
classification, as shown in Fig 2. The technique 
preserves both the short- and long-term spatial 
correlations between image patches by combining 
the benefits of convolutional and recurrent networks. 
The experimental results perform better than the 
state-of-the-art techniques, with an average accuracy 
of 91.3% in a four-class classification problem. 
Additionally, the authors provide a_ significant 
dataset of 3771 histological pictures of breast cancer, 
highlighting the diversity of the disease across age 
groups and subtypes and offering a priceless resource 
to the scientific community. 


Fig 2. Proposed Method Source: [3] 


Similarly, In the Same Direction [4] highlights the 
difficulties in the histological analysis of breast 
cancer and the time-consuming, subjective nature of 
the manual diagnosis. For the final classification of 
breast cancer histopathology photos showing 
carcinoma and non-carcinoma, the authors suggest 
using an ensemble deep learning technique. Based on 
previously trained VGG16 and VGG19 architectures, 
four models are trained, and an ensemble approach is 


used by averaging predicted probabilities. The 
optimized VGG16 and VGG19 model ensemble 
exhibit competitive classification performance, with 
a sensitivity of 97.73% and an overall accuracy of 
95.29%, especially for carcinoma. The architecture 
of VGG 16 is shown in Fig 3. Robust experimental 
results demonstrate the effectiveness of the proposed 
deep learning approach in automating the 
classification of difficult histopathology photos, 
particularly for carcinoma images. 
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Fig 3. VGG 16 Architecture Source: [4] 


Breast cancer is the second most common cancer in 
women worldwide, accounting for a large percentage 
of newly diagnosed cases in this context [5] seeks to 
create an exact algorithm that uses biopsy images to 
identify breast cancer. The study uses a deep learning 
strategy, creating a Convolutional Neural Network 
with transfer learning using a library of photos 
related to breast cancer. The accuracy obtained is 
higher than 96%, which is better than that of other 
state-of-the-art algorithms. This highlights the 
algorithm's accuracy in boosting early detection for 
better patient outcomes and its potential to 
considerably aid in diagnosing breast cancer. Due to 
their high prevalence and fatality rates, canine 
mammary tumours (CMTs) are valuable models for 
studying human breast cancer. So [6] discusses how 
difficult and time-consuming it is to diagnose human 
breast cancer and CMTs_ via histological 
investigation. Introducing the first dataset of CMT 
histopathology pictures (CMTHis), the paper 
presents a framework based on VGGNet-16 for 
automatic classification, as shown in Fig 4. The 
system uses support vector machines and is 
evaluated on the CMT and human breast cancer 
datasets. It obtains mean accuracies of 97% and 93% 
for binary classification of human breast cancer and 
CMT, respectively. The study highlights the 
potential of the suggested approach for automated 
diagnosis in veterinary and human medical contexts 
while validating its efficacy. 
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CNN Feature Extraction with VGGNet-16 


Fig 4. Histopathology Image Classification Frame Work 
Source: [6] 
Another study [7] explores the diagnosis of breast 
cancer using histopathology pictures by contrasting 
deep learning (DL) and conventional machine 
learning (CML) techniques, as shown in Fig 5. The 
study fine-tunes VGG-19 for DL _ using 
histopathology images through a transfer learning 
approach. Evaluation of the BreaKHis dataset and 
validation on KIMIA_ Path960 indicate DL 
outperforming CML, obtaining accuracies between 
94.05% and 98.13% for binary classification and 
76.77% to 88.95% for eight-class classification. 
Enhancing clinical interpretability through visual 
interpretation of learnt features, such as attention 
maps, increases confidence in DL techniques as 
trustworthy instruments for breast cancer diagnosis. 
[8] done a review on the breast cancer image 
classification. This review highlights the application 
of artificial deep neural networks in multiple medical 
imaging modalities, focusing on the  cate- 
categorization of cancer. 
AnalyzingAnalyzingcations from eight repositories, 
the review evaluates factors such as imaging 
modalities, datasets, preprocessing approaches, 
neural network types, and performance measures. 
Histopathologic pictures and mammograms are 
frequently employed, and available databases are 
used in 55% of investigations. Preprocessing 
methods are widely used, including scaling and 
normalization. A lot of research uses convolutional 
neural networks (CNNs), often using pre-trained 
networks. Accuracy, area-under-the-curve, 
sensitivity, precision, and F-measure are examples of 
evaluation metrics. The review lists ten open 
research issues, offering a comprehensive resource 
for both novices and advanced researchers in deep 
learning-based breast cancer category categorization. 
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Fig 5. Proposed DL-Based Architecture Source: [7] 


[9] The proposed method serves two purposes. First, 
it looks into different deep learning models for 
categorizinghology images of breast cancer and finds 
the best models for binary, fourth, and eighth 
classifications. Model accuracy is impacted by data 
augmentation, preprocessing, and transfer learning 


techniques. Second, it evaluates state-of-the-art 
models (ResNeXt, Dual Path Net, SENet, NASNet) 
on the BreakHis and BACH datasets, which have not 
received much attention in previous research. Better 
results were obtained with Inception-ResNet-V2 for 
eight and binary classifications. The work provides a 
thorough analysis and discussion of experimental 
conditions used in investigations on 
histopathological images of breast cancer. Similarly, 
[10] offers Pa-DBN-BC, a novel patch-based deep 
learning algorithm utilizing Belief Network (DBN), 
for identifying and classifying breast cancer in 
histopathological images, as shown in Fig 6. The 
method consists of supervised fine-tuning stages and 
unsupervised pre-training for feature extraction from 
image patches. Patch classification uses logistic 
regression, which gives findings as a probability 
matrix showing positive (cancer) or negative 
(background) samples. By automatically identifying 
the best characteristics, the Model surpasses 
conventional techniques with an accuracy of 86% 
when evaluated and trained on a variety of whole 
slide histopathology image datasets. 
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Fig 6. PA-DBN-BC Model Structure 

Source: [10] 
[11] Focuses on creating a convolutional neural 
network (AlexNet)-based computer-aided diagnostic 
system for histopathology pictures of breast cancer. 
Conventional feature extraction techniques need to 
be more accurate and time-consuming. The system 
under consideration utilizes AlexNet generated from 
the BreaKHis dataset, as shown in Fig 7. 
Experiments are conducted at various magnification 
factors. High accuracy (95%), sensitivity (97%), 
specificity (90%), and AUC (99.36%) of the results 
demonstrate the efficacy of the suggested method in 
differentiating between benign and malignant breast 
cancer. 
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Fig 7. CNN & AlexNet Architecture 
Source: [11] 
B. Summary and Problem statement 


After conducting a brief literature review on the 
recent studies made by various researchers in the 
domain of breast cancer prediction, the 
following few of the observations are made. 

Manual diagnosis of breast cancer 
histopathology images is challenging due to its 
subjective and time-consuming nature. There is 
a need for automated and objective methods to 
improve diagnostic accuracy and efficiency. 
Early diagnosis of breast cancer is crucial for 
improving patient outcomes. Automated deep 


learning-based approaches offer the potential to 
enhance early detection and facilitate timely 
interventions. Deep learning, _ particularly 
convolutional neural networks (CNNs), has 
shown promise in automating the classification 
of breast cancer histopathology images. These 
approaches have demonstrated high accuracy 
and sensitivity, outperforming _ traditional 
machine learning techniques. 

With the literature review showing that pre- 
trained models are frequently used in breast 
cancer detection research and my project's 
strength being the use of a customized 
convolutional neural network (CNN) the 
problem statement can be summarized as 
follows. Pre-trained models, such VGG16 and 
VGG19, are primarily used in the literature 
currently available on breast cancer detection for 
image classification tasks. Despite their 
excellent accuracy, these models might not be 
tailored to the particulars of breast cancer 
histopathology pictures. Consequently, a 
customized strategy based on CNN architectures 
created especially to handle the difficulties 
involved in breast cancer diagnosis is required. 
By creating a customized CNN for breast cancer 
diagnosis and prediction, my research seeks to 
close this gap. This research aims to increase the 
precision and generalizability of breast cancer 
detection models, ultimately leading to more 
efficient and dependable diagnostic instruments 
in clinical practice. It does this by utilizing the 
advantages of customized CNNs, such as their 
adaptability in model architecture and capacity 
to optimize for the particular task at hand. 


3. METHODOLOGY 
Data 
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Fig 8. Proposed Methodology 


A systematic approach that includes data collecting, 
image preprocessing, creation of a custom CNN 
model, prediction, and output analysis is used to 


SRR REREEEeEEiiEE ee 
2498 


Journal of Theoretical and Applied Information Technology 
31° March 2024. Vol.102. No 6 


© Little Lion Scientific 


ISSN: 1992-8645 


JATIT 


E-ISSN: 1817-3195 


predict breast cancer, as shown in Fig 8. The 
meticulous completion of each phase is crucial to the 
predictive Model's effectiveness. 

In this research, we are focusing on using a custom 
convolutional neural network (CNN) for the 
detection and prediction of breast cancer. The 
literature review revealed that many existing studies 
in this area have utilized pre-trained models, such as 
VGG16 and VGG19, for breast cancer image 
classification. However, the strength of our research 
lies in the development and implementation of a 
custom CNN architecture specifically tailored to 
address the challenges and nuances of breast cancer 
detection. By designing a custom CNN, we aim to 
improve the accuracy and generalizability of the 
classification model, particularly in scenarios where 
pre-trained models may not perform optimally. This 
approach allows for the customization of the network 
architecture to better capture the unique features of 
breast cancer histopathology images, potentially 
leading to more accurate and reliable predictions. 
When compared to pre-trained models, employing a 
custom CNN for breast cancer detection and 
prediction has a number of advantages. First, a 
customized CNN that is tailored to the task at hand 
and takes into consideration the distinct features of 
images from breast cancer histopathology can be 
created. This gives the model architecture more 
flexibility, which could result in better accuracy and 
performance than if you used a pre-trained model 
that wasn't really suited for this kind of work. Second, 
creating a custom CNN from scratch gives you more 
control over the training procedure and enables you 
to adjust the model according to the available dataset. 
This can be especially helpful when there is a little 
dataset available or when the data distribution is very 
different from the dataset used to train the pre-trained 
model. Researchers can adjust a custom CNN to 
more closely match the features of the dataset, which 
could improve performance and generalization. 
Furthermore, avoiding any biases or restrictions seen 
in pre-trained models can be facilitated by employing 
a bespoke CNN. Large, diverse datasets are 
frequently used to train pre-trained models, which 
may not adequately represent the subtleties ofa given 
problem area. Researchers may make sure the model 
is especially tuned for the goal of breast cancer 
detection by training a bespoke CNN, which could 
result in more accurate and trustworthy outcomes. 


A. Data Collection 

Carefully gathering data is the first step in the 
prediction of breast cancer. In this context, data from 
medical imaging, especially mammography, is 
collected from several sources. Images of breast 
tissues labelled with corresponding diagnostic results, 


such as benign or cancerous, are included in this data 
set. A comprehensive dataset is essential for training 
a robust model to ensure that the CNN learns patterns 
and features indicative of breast cancer across 
different circumstances and patient profiles. 


B. Image Preprocessing 

The next crucial stage after gathering the data is 
picture preparation. Several actions are involved to 
improve the quality and relevance of the images for 
model training. To help in training convergence, 
rescaling is done to normalize lues to a standard 
range, usually [0, 1]. Furthermore, data 
augmentation methods, including rotations, flips, 
and random brightness modifications, are used. By 
subjecting the Model to various viewpoints on breast 
tissue, these augmentations add heterogeneity to the 
dataset and aid in the generalization of the Model. 


C. Custom CNN Model 

Creating a convolutional neural network (CNN) 
model is the foundation of the breast cancer 
prediction methodology. The architecture is 
painstakingly constructed to extract hierarchical 
information from the input photos. Convolutional 
layers for feature extraction, max-pooling layers for 
spatial reduction, and dropout layers for 
regularisation are found in each convolutional block 
that makes up the CNN. The retrieved features are 
combined by fully linked layers after the Model to 
get a final classification. Before the dense layers, 
batch normalization is used to stabilize the 
stabilizing process, and global average pooling is 
used to reduce spatial dimensions further. After that, 
the model architecture is constructed, defining 
suitable optimizer functions for jobs involving 
binary classification. 


D. Prediction using CNN Model 

The prediction phase involves feeding fresh, 
untrained mammography pictures into the trained 
CNN model. Using the features it has learnt, the 
Model uses its layers to interpret these images and 
forecast whether or not breast cancer will develop. 
The output layer's sigmoid activation function 
converts the Model's unprocessed predictions into 
probabilities, showing the malignancy likelihood. 
This stage is essential for evaluating the Model's 
capacity to provide precise predictions and 
generalize unseen data. 


E. Final Output 

AnalyzingAnalyzingl's outputs is the last stage in the 
approach for predicting breast cancer. Predictions are 
compared against ground truth labels to assess the 
Model's accuracy, precision, recall, and F1 score. 
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The discriminating power of the Model can also be 
evaluated using Area Under the Curve (AUC) 
metrics and Receiver Operating Characteristic (ROC) 
curves. Additionally, a confusion matrix provides 
insights into false positives and false negatives. This 
in-depth research informs future adjustments to 
increase the forecast accuracy of the Model and helps 
assess its dependability. 

Developing the Model, predicting, preprocessing, 
and analyzing analyses are part of the multifaceted 
breast cancer prediction approach process. The 
quality and diversity of the dataset, the potency of 
picture preprocessing methods, and the structural 
integrity of the customized Model are all critical to 
the predictive Model's performance. The Model's 
dependability in practical applications is ensured by 
thoroughly verifying predictions against ground truth 
labels, which supports ongoing attempts to improve 
breast cancer diagnosis and therapy. 


F. Data Collection 

Data collecting is a crucial stage for breast cancer 
prediction research, which uses the abundant 
resources offered by Kaggle, an open-source 
platform well-known for housing various datasets. 
Kaggle provides a repository for several medical 
imaging datasets, particularly those linked to breast 
cancer. The procedure of acquiring the dataset entails 
gaining access to mammography images and well- 
labelled diagnostic results that indicate whether or 
not there are any malignant or benign diseases. 
Building a comprehensive dataset is made possible 
by the wide range of patient profiles, imaging 
modalities, and clinical variants included in the 
Kaggle collection. This diversity is essential for 
developing a robust predictive model to identify 
complex patterns linked to breast cancer in various 
contexts. Through the collaborative study of various 
datasets made available by the platform, researchers 
and data scientists can progress the field of breast 
cancer prediction by contributing to and benefiting 
from a shared pool of knowledge. This can be 
achieved by utilizing a collaborative environment. 
By improving data quality and encouraging openness 
and cooperation among scientists, this strategy 
eventually aids in creating more precise and broadly 
applicable breast cancer prediction models. 


G. DataSet 

The dimensions of the training and test datasets are 
essential factors that influence the predictive Model's 
effectiveness and generalizability in breast cancer 
prediction. With a size of (2582, 5), the training 
dataset denotes the existence of 2582 samples or 
instances, each of which is distinguished by five 
attributes. These elements likely include a range of 


clinical and imaging-related factors, which serve as 
the required input for the custom convolutional 
neural network (CNN) to identify patterns that may 
suggest breast cancer. The Model is exposed to a 
broader range of cases thanks to the more extensive 
training dataset, which helps it generalize new 
examples. On the other hand, 1251 instances of the 
test dataset measuring (1251, 5) have the same five 
attributes. This set acts as an impartial baseline to 
assess the Model's performance on fresh, untested 
data. To evaluate the Model's resilience and make 
sure it can produce correct predictions outside of the 
training data, the size of the training and test datasets 
must be balanced. The Model's capacity to generalize 
is affected by the differences in size between the 
training and test datasets; therefore, preserving a 
harmonious proportionality to maximize maximum 
performance in breast cancer prediction is crucial. 


H. Image Preprocessing 

The preprocessing of the breast cancer prediction 
dataset entails transforming each mammography 
image to fit into this standardized image size. This 
stage guarantees consistency and interoperability for 
the training, validation, and testing stages. 
Furthermore, the dataset is suitably divided into test, 
validation, and training sets, enabling efficient model 
testing, tuning, and assessment. The uniform size of 
images reduces processing overhead. It facilitates the 
smooth incorporation of images into the customized 
neural network, which improves the Model's ability 
to extract pertinent features and patterns suggestive 
of breast cancer in all standard dimensions. 


I. Custom Convolutional Neural Network 

For the prediction of breast cancer, the custom 
Convolutional Neural Network (CNN) architecture 
is a potent tool. Its hierarchical structure is designed 
to pick up on minute details in breast cancer 
screening images, from convolutional blocks that 
extract local characteristics to fully connected layers 
that capture global patterns. When identifying 
minute anomalies that may be signs of breast cancer, 
CNN's capacity to automatically learn pertinent 


properties like textures, edges, and_ spatial 
hierarchies is essential. Including batch 
normalization layers and data augmentation 


approaches improves the Model's robustness and 
generalization, and overfitting is avoided during 
training. Spatial reduction is facilitated by using 
max-pooling layers, which highlight important 
features. The output layer's last sigmoid activation 
offers a probabilistic interpretation, which enables 
the Model to forecast the chance of malignancy. 

In reality, 224 x 224 breast cancer pictures are fed 
into the CNN, ensuring uniform dimensions for easy 
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processing. The Model can identify various 
anomalies in breast tissue because of its extensive 
training on various images. By streamlining the 
breast cancer prediction process and offering an 
automated and precise tool for early diagnosis, this 


J. CNN Architecture Model Summary 


CNN-based method architecture is a helpful tool in 
the ongoing attempts to enhance patient outcomes 
and diagnosis of breast cancer because of its capacity 
to discern complicated patterns and adapt to different 
imaging parameters. 


Table 1. CNN Architecture Model Summary 


Model: "CustomCNN" 
Layer (type) Output Shape Param # 
random _brightness_1 (None, 224, 
(Random Brightness) 224, 3) 0 
: ; (None, 224, 
random _flip_1 (RandomFlip) 224, 3) 0 
random _ rotation_1 (random (None, 224, 
rotation) 224, 3) 0 
; : (None, 224, 
rescaling (Rescaling) 224, 3) 0 
batch_normalization (None, 224, 
(BatchNormalization) 224, 3) 12 
(None, 222, 
conv2d (Conv2D) 272, 32) 896 
max_pooling2d (None, 111, 
(MaxPooling2D ) 111, 32) 0 
(None, 111, 
dropout (Dropout) 111, 32) 0 
(None, 109, 
conv2d_1 (Conv2D) 109, 64) 18496 
max_pooling2d 1 (None, 54, 54, 
(MaxPooling2D) 64) 0 
(None, 54, 54, 
dropout_1 (Dropout) 64) 0 
(None, 52, 52, 
conv2d_2 (Conv2D) 128) 73856 
max_pooling2d 2 (None, 26, 26, 
(MaxPooling2D) 128) 0 
(None, 26, 26, 
dropout_2 (Dropout) 128) 0 
global_average_pooling2d 
(GlobalAveragePooling2D) (None, 128) 0 
dropout_3 (Dropout) (None, 128) 0 
dense (Dense) (None, 256) 33024 
dropout_4 (Dropout) (None, 256) 0 
dense_1 (Dense) (None, 32) 8224 
dense_2 (Dense) (None, 1) 33 
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RandombBrightness CNone, 224, 224, 3) 


output: 


random flip 1 
Random Flip 


imput: 


output: 


(None, 224, 224, 3) 
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Random Rotation output: 


random_rotation_1 input: (None, 224, 224, 3) 
(None, 224, 224, 3) 


rescaling imput: (None, 224, 224, 3) 


output: CNone, 224, 224, 3) 


Rescaling 


batch normalization | input: 


BatchNormalization output: (None, 224, 224, 3) 


(None, 224, 224, 3) | 


(None, 224, 224, 3) 
(None, 222, 222, 32) 


: 


max pooling2d input: | (None, 222, 222, 32) 


MaxPooling2D output: (None, 111, 111, 32) 


dropout imput: (None, 111, 111, 32) 
Dropout output: (None, 111, 111, 32) 
conv2d_1 imput: (None, 111, 111, 32) 


output: (None, 109, 109, 64) 


ax_pooling2d_1 
MaxPooling2D 


CNone, 109, 
(None, 54, 


dropout_1 input: (None, 54, 54, 64) 

Dropout output: CNone, 54, 54, 64) 
conv2d 2 input: CNone, 54, 54, 64) 
Conv2D output: (None, 52, 52, 128) 


max_pooling2d_2 
MaxPooling2D 


imput: 


CNone, 52, 52, 128) | 


cutput: (None, 26, 26, 128) 


dropout_2 imput: (None, 26, 26, 128) 


output: (None, 26, 26, 128) 


Dropout 


| global_average pooling2d | imput: 


GlobalAveragePooling2D | cutput: 


CNone, 26, 26, 128) 
(None, 128) 


dropout_3 imput: (None, 128) 
Dropout output: (None, 128) 
dense input: (None, 128) 
Dense output: CNone, 256) 


dropout_4 


(None, 256) 
(None, 256) 


Dropout 
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Dense output: (None, 32) 
dense 2 input: (None, 32) 
Dense output (None, 1) 


Fig 9. The architecture of Custom CNN 


Table 1 shows the Convolutional Neural Network on each layer's output shape and number of 
(CNN) model's architecture, including information parameters. The same information is given in the 
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pictorial format in the Fig 9. The first layers comprise 
data augmentation methods, including rotation, 
flipping, and random brightness adjustment. These 
methods all help to diversify the training dataset 
without adding extra parameters. The Model then 
integrates necessary preprocessing layers, including 
batch normalization, adding 12 parameters for 
normalization calling. To capture complicated 
patterns, the convolutional layers (Conv2D) 
gradually increase the number of filters (32, 64, and 
128) while extracting features from the input. 
Dropout layers are used for regularisation after max- 
pooling layers to reduce spatial dimensions. The 
Global Average Pooling layer reduces the spatial 
dimensions to a flat representation. The global 
feature representation is aided by fully connected 
layers (Dense) of different sizes (256, 32), and the 
final output layer, which has a sigmoid activation, 
predicts the probability of breast cancer. This 
architectural overview presents a balanced and 
organized approach to predicting breast cancer. 


K. Activation Function 

Convolutional neural networks (CNNs) are used to 
predict breast cancer. The selection of activation 
functions—Rectified Linear Unit (ReLU) and 
Sigmoid, in particular—is critical to the behaviour 
and predictive power of the Model. 

ReLU is a popular activation function that allows the 
Model to activate when the input is positive and to 
activate zero otherwise, introducing non-linearity. It 
is exceptionally well suited for CNNs due to its ease 
of use and effectiveness in training. ReLU activation 
in convolutional layers aids the network in learning 
intricate patterns and _ features found = in 
mammography pictures in the breast cancer 
prediction model. This non-linearity improves the 
Model's ability to identify minute details that may be 
signs of cancer. 

Conversely, the last layer of binary classification 
models usually uses the Sigmoid activation function. 
Sigmoid activation in breast cancer prediction 
converts the Model's raw output into a probability 
score ranging from 0 to 1, signifying the probability 
that the mammogram is malignant. This probability 
helps physicians make decisions by making it easier 
to understand the Model's predictions. When 
diagnosing breast cancer, sigmoid activation plays a 
crucial role in transforming the continuous output of 
the Model into a functional binary classification that 


can differentiate between benign and malignant cases. 


To sum up, integrating Sigmoid and ReLU activation 
functions enhances the efficacy and 
comprehensibility of CNNs in predicting breast 
cancer, permitting more precise and practically 
applicable results. 


L. Augmentation 

In breast cancer prediction, augmentation is vital in 
strengthening the robustness and generalization 
model. An example image from the training dataset 
is applied with different augmentations in the 
provided code snippet. The dataset is more diverse 
by applying various circumstances to the image, such 
as rotations, flips, and random brightness 
modifications. By introducing variability, this 
augmentation method helps the Model better adapt to 
a wide range of scenarios and, in the end, enhances 
its capacity to identify patterns suggestive of breast 
cancer across various patient cases and imaging 
settings. Enhancing the Model's performance on 
unknown data and preventing overfitting are two 
essential augmentation goals. 


4. RESULT AND ANALYSIS 


A. Confusion Matrix 

A crucial evaluation tool for classification models, 
such as those used in the prediction of breast cancer, 
is the confusion matrix. In the binary classification 
scenario inherent to breast cancer diagnosis, this 
matrix comprehensively captures the performance of 
the Model by categorizing it into four components: 
True Positive (correctly identified malignant cases), 
True Negative (correctly identified benign cases), 
False Positive (false alarms, benign cases 
misclassified as malignant), and False Negative 
(missed diagnoses, malignant cases misclassified as 
benign). Critical performance indicators, including 
accuracy, precision, recall, and the F1 score, are 
derived from these constituents. In the context of 
breast cancer prediction, this matrix provides a 
nuanced viewpoint on the Model's advantages and 
disadvantages, facilitating the evaluation of the 
Model's clinical significance and possible influence 
on patient outcomes. 

Confusion Matrix 


True label 


oO 1 
Predicted label 


Fig 10. Confusion Matrix 
When considering two classes, Class 0 and Class 1, 
the confusion matrix that is produced provides 
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insightful information about how well a 
classification model performs, as shown in Fig 10. 
The diagonal members represent the accurate 
optimistic and true pessimistic predictions in these 
matrixes, and the off-diagonal elements represent the 
false positives and false negatives. Remarkably, the 
high number of true positives (852) indicates that the 
Model is adept at correctly detecting occurrences of 
Class 1. Nevertheless, the 57 false positives—cases 
mistakenly classified as Class 1—show that there is 
still room for improvement in the Class 0 prediction. 
This implies that it might be challenging to discern 
between the two classes, which could cause Class 1 
to get needless alerts. 

Furthermore, the 27 false negatives show that there 
have been cases where Class 1 was mistakenly 
forecasted as Class 0. This misclassification is 
significant because it shows occasions in which the 
Model cannot detect actual positive cases, 
particularly in the context of breast cancer prediction. 
The Model's accuracy depends on how well it 
balances, reducing false positives and negatives. In 
breast cancer prediction, an additional study that 
includes precision, recall, and Fl score estimates 
would offer a more thorough knowledge of the 
Model's advantages and shortcomings. 


ROC-AUC Curve 


True Positive Rate (Positive label: 1) 


0.0 —— CustomCNN (AUC = 0.98) 


0.0 0.2 0.4 0.6 os 1.0 
False Positive Rate (Positive label: 1) 


Fig 11. Roc-Auc Curve 
In the context of breast cancer prediction models, the 
Area Under the Receiver Operating Characteristic 
(ROC) Curve, or AUC-ROC, is a_ crucial 
performance indicator. An elevated AUC value, like 
the noteworthy 0.98 previously mentioned, signifies 
an extraordinary discriminatory capacity of the 
Model. The ROC curve provides a detailed 
evaluation of the Model's classification ability by 
visually representing the trade-off between the actual 
positive rate (sensitivity) and false positive rate 
across different threshold settings, as shown in Fig 
11. A 0.98 AUC in the prediction of breast cancer 
suggests that the Model has a solid capacity to 
discriminate between benign and malignant cases. 
The discriminating power of the Model is better the 
closer its AUC value is to 1. With a high AUC value, 
the Model can minimize false positives and false 
negatives by properly balancing sensitivity and 
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specificity. This kind of performance is critical for 
the prediction of breast cancer, as early detection and 
treatment of the disease depend on the correct 
diagnosis of malignancies. A high AUC-ROC value 
indicates a well-performing model, which adds 
credence to its clinical application and supports its 
potential as a valuable tool for helping medical 
professionals diagnose breast cancer accurately. 


Precision-Recall Curve 
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Fig 12. Precision-Recall Curve 

Fig 12 shows the precision-recall curve for the breast 
cancer custom CNN model. The curve for the 
CustomCNN model is very close to the top-left 
corner of the graph, which is the ideal location for a 
precision-recall curve. This means that the Model is 
very good at both precision and recall. The curve is 
smooth and has no sharp drops, suggesting that the 
Model is well-calibrated. The AP (average precision) 
for the Model is 0.99, which is very high. This means 
the Model is, on average, precise and has a high 
recall. 
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Fig 13. Accuracy For Training And Validation Set 
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Fig 14. Loss For Training And Validation Set 


Figs. 13 and 14 show the training accuracy, 


validation accuracy, training loss, and validation loss. 


The effectiveness of the custom CNN model in 
predicting breast cancer was revealed by the 
performance metrics acquired from it. The Model's 
capacity to reduce mistakes throughout the training 
phase is reflected in the training loss of 0.4667. The 
Model's ability to discriminate between positive and 
negative instances is demonstrated by the ROC AUC 
(Receiver Operating Characteristic Area Under the 
Curve) score of 0.8138; higher values indicate more 
substantial discriminatory power. Furthermore, a 
binary accuracy of 0.8058 highlights how well the 
Model classifies cases into binary categories. 

The validation results further support the robustness 
of the Model. The validation loss of 0.4300 
demonstrates effective data generalization, while the 
ROC AUC value of 0.8703 represents the Model's 
enhanced discriminatory ability on the validation set. 
The Model's ability to accurately predict binary 
outcomes in new situations is demonstrated by its 
validation binary accuracy of 0.8421. 

One key hyperparameter that affects the Model's 
convergence during training is the learning rate (LR) 
of 2.5000e-05. This extensive collection of metrics 
provides a thorough knowledge of the custom CNN's 
performance, including binary accuracy, learning 
rate, ROC AUC, and loss. This gives confidence in 
the custom CNN's potential value for accurate breast 
cancer prediction in real-world scenarios. 


B. Comparison of Results 

The literature review highlights various deep 
learning approaches for breast cancer detection and 
prediction, showcasing notable achievements in 
accuracy, AUC, and precision-recall metrics. The 
hybrid convolutional and recurrent deep neural 
network used by [3] achieved an accuracy of 91.30%, 
demonstrating the potential of combining different 
network architectures. The ensemble of pre-trained 


VGG16 and VGG19 models used by [4] achieved a 
higher accuracy of 95.29%, showcasing the 
effectiveness of leveraging pre-trained models. 
Transfer learning with CNNs used by [5] achieved an 
accuracy exceeding 96%, indicating the benefits of 
using pre-trained models for feature extraction. 
Additionally, using VGG16 for feature extraction 
and SVM used by [6] for classification resulted in 
high accuracies of 97% and 93% for binary 
classification of human breast cancer and canine 
mammary tumors, respectively. Transfer learning 
with VGG19 used by [7] achieved accuracies 
between 94.05% and 98.13%, highlighting the 
effectiveness of transfer learning. Lastly, the use of 
Alex-Net used by [11] achieved an accuracy of 95% 
and an impressive AUC of 99.36%, showcasing the 
power of deep learning in histopathology image 
analysis. 

Comparing these results to our custom CNN model, 
we note that our model achieved an accuracy of 
93.20% and an AUC of 98%, which is slightly lower 
than some of the top-performing techniques in the 
literature. However, it is crucial to highlight that our 
model is a custom CNN, unlike the others that used 
pre-trained models. This distinction is significant as 
it indicates that our model was specifically designed 
and trained for breast cancer detection and prediction, 
rather than being adapted from models trained on 
unrelated tasks. While our model's accuracy is 
slightly lower than some pre-trained models, it 
demonstrates competitive performance, especially 
considering its customized architecture. Additionally, 
our model achieved a precision-recall of 99%, 
indicating its ability to correctly identify positive 
cases with high precision. Overall, our custom 
CNN model offers a unique approach to breast 
cancer detection and prediction, focusing on 
customization and task-specific optimization, which 
sets it apart from the pre-trained models used in the 
literature. 


5. CONCLUSION 


To sum up, the methodology for predicting breast 
cancer that has been provided is methodical and all- 
inclusive; it includes data gathering, image 
preprocessing, building bespoke CNN models, 
making predictions, and analyzing. Training a robust 
custom CNN model starts with carefully selecting 
various datasets from sites like Kaggle and applying 
rigorous picture preprocessing methods like 
rescaling and data augmentation. Convolutional 
blocks, dropout layers, and global average pooling 
show how well-thought-out CNN's architecture is for 
hierarchical feature extraction. The Model's ability to 
distinguish between benign and malignant instances 
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is demonstrated by the evaluation metrics, which 
include the confusion matrix, AUC-ROC, and 
precision-recall curve. Additionally, the Model's 
performance is noteworthy for its potential clinical 
relevance, as evidenced by its outstanding accuracy 
of 0.93285, ROC AUC (0.98051), and binary 
accuracy scores. This methodology advances the 
field of breast cancer prediction by providing a 
valuable instrument for early diagnosis and 
intervention in practical situations. 


Future Research Directions 

e Enhancing Model Generalization: In 
spite of the custom CNN architecture's 
encouraging results in classifying images of 
breast cancer, more investigation is required 
to improve the model's generalisation over 
a wider range of datasets and clinical 
contexts. It might be possible to adapt the 
model to different imaging protocols and 
tissue preparation techniques that are 
frequently encountered in real-world 
scenarios by looking into transfer learning 
techniques or domain adaptation methods. 

e =©6Validation on Diverse Patient 
Populations: The majority of the images in 
the BreakHis dataset used in this study 
come from a particular demographic or 
geographic area. Future studies should 
confirm how well the customised CNN 
architecture performs on a wider range of 
patient populations to guarantee that it can 
be applied to a variety of age groups, 
ethnicities, and healthcare environments. 
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